An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes
نویسندگان
چکیده
T paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
منابع مشابه
An Evolutionary Random Search Algorithm for Solving Markov Decision Processes
heterogeneous and dynamic problems of engineering technology and systems for industry and government. ISR is a permanent institute of the University of Maryland, within the Glenn L. Martin Institute of Technology/A. James Clark School of Engineering. It is a National Science Foundation Engineering Research Center. Web site http://www.isr.umd.edu I R INSTITUTE FOR SYSTEMS RESEARCH TECHNICAL RESE...
متن کاملA Genetic Search In Policy Space For Solving Markov Decision Processes
Markov Decision Processes (MDPs) have been studied extensively in the context of decision making under uncertainty. This paper presents a new methodology for solving MDPs, based on genetic algorithms. In particular, the importance of discounting in the new framework is dealt with and applied to a model problem. Comparison with the policy iteration algorithm from dynamic programming reveals the ...
متن کاملSolving Multi-objective Optimal Control Problems of chemical processes using Hybrid Evolutionary Algorithm
Evolutionary algorithms have been recognized to be suitable for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier. This paper applies an evolutionary optimization scheme, inspired by Multi-objective Invasive Weed Optimization (MOIWO) and Non-dominated Sorting (NS) strategi...
متن کاملA Survey of Some Simulation-based Algorithms for Markov Decision Processes
Many problems modeled by Markov decision processes (MDPs) have very large state and/or action spaces, leading to the well-known curse of dimensionality that makes solution of the resulting models intractable. In other cases, the system of interest is complex enough that it is not feasible to explicitly specify some of the MDP model parameters, but simulated sample paths can be readily generated...
متن کاملIntegrating value functions and policy search for continuous Markov Decision Processes
Value function approaches for Markov decision processes have been used successfully to find optimal policies for a large number of problems. Recent findings have demonstrated that policy search can be used effectively in reinforcement learning when standard value function techniques become overwhelmed by the size and dimensionality of the state space. We demonstrate that substantial benefits ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- INFORMS Journal on Computing
دوره 19 شماره
صفحات -
تاریخ انتشار 2007